HOME | ABOUT ME | LAB | RESEARCH | TEACHING

Intro to Data Analysis

Utah Valley University - BIOL3100

Course Syllabus

Course file repository

Shared Course Notes

(Anyone with link can edit)

Exams GitHub Repository (Exams will be uploaded at the appropriate times)

The philosophy of this course

R for Data Science Website

Collection of beginner R resources

Amy Willis’ Intro to R course

(for related alternative exercises/lessons)

Big Book of R collection of free R books …Whoa!


Table of Contents

Week 1 | Week 5 | Week 9 | Week 13

Week 2 | Week 6 | Week 10 | Week 14

Week 3 | Week 7 | Week 11 | Week 15

Week 4 | Week 8 | Week 12 | Week 16


The Command Line, File Paths, Git

Week 1

Topics:

  • Installing Software | Command-line | Git version control

Assignments

  • Read: What is Git all about?
  • Install Git, R, and R-Studio on your laptop (part of Assignment 1)
  • Be ready to explain what Git, R, and R-Studio are.
  • Do Assignment 1 and upload a link to your new GitHub account to Canvas.
  • Take a look at this document to see where this class is going
  • Go through ALL the resources below. I put them here for a reason. Most are short web resources or videos (some that I made).

Resources

Practice

  • Make 10 more separate changes and commits to your README.md file and push each one to GitHub
  • Close and open your command line terminal 10 times
  • Open your command line terminal and navigate to your new personal GitHub repository for this course (Data_Course_LASTNAME) / Navigate back to your desktop / From your Desktop (without using “cd”) display the contents of Data_Course_LASTNAME/README.md onto your computer screen.
  • Please view this short video clip from “Karate Kid” (Seriously)
    • When I tell you to close and open your command line 10 times, it’s not because I hate you.
    • It’s because I, too, have had to learn this stuff from scratch
    • It’s because I know that repetition is crucial to learning this, especially at the beginning
    • And it’s because if you don’t spend the time to do this stuff over and over now, by week 6 you will be drowning and helpless.
    • When I say “push 10 separate commits to your GitHub repo,” what I’m actually saying is “Show me ‘Paint the Fence’!”
    • Because very soon, Mr. Miyagi will be attacking you with things like “Error in url[i] = paste(df[,2], gsub(” “,”_“, : object of type ‘closure’ is not subsettable”

Back to top of page


Week 2

Topics

  • File paths | Pipes | Redirection | Wildcards | Essential Unix Commands | Parameters

Assignments

Resources

Practice

  • Exercise 1
  • Exercise 2
  • Shell find exercise
  • Pick some random files in our Data/ directory and practice (a bunch of times) using the following commands (with various flags and linking them with pipes):
    • head
    • tail
    • wc
    • ls
    • sort
    • uniq
  • Practice using wildcards (*) to show the contents of random files
    • Write a command using wildcard(s) that lists only the files “ethane.pbd” and “methane.pbd” in the Data/data-shell/molecules/ directory
    • Pick a few random files in a directory and try to write a wildcard expression that shows ONLY those files (wash, rinse, repeat)
  • find out what the following shell commands do and how to use them (then practice using them):
    • touch
    • cp
    • mv

Back to top of page

Getting to Know R

Week 3

Topics

  • R Data types and conversions | Reading and Writing Files | Packages and Projects

Assignments

  • Watch this video on how to set up your personal Git Directory and R-Projects for completing assignments
  • Read this chapter on what a “package” is in R
  • Read this chapter on R-Projects (We will ALWAYS work from within R-Projects from now on)
  • Do Assignment 3 (We will start this one together during live class)
  • Do Assignment 4

Resources

Practice

Back to top of page


Week 4

Topics

  • Logical Operations | Subsetting | “Grammar of Graphics” ggplot Intro

Assignments

  • Read through the materials in the Resources section below
  • Make sure you have the following packages installed if you don’t already:
    • tidyverse
    • carData
    • RColorBrewer
    • colorblindr This one is “special” and not found on CRAN so click the link for install instructions.
  • Assignment 5

Resources

Practice

Back to top of page

Visualizing a Data Set

Week 5

Topics

  • ggplot | patchwork | ggforce

Assignments

Resources

Practice

Back to top of page

Clean and Transform Data

Week 6

Topics

  • Tidy Data | dplyr verbs | tidyr verbs

Assignments

Resources

Practice

Back to top of page


Week 7

Topics

Assignments

  • Read This Handout
  • Read This Paper
  • Create a validated Excel Spreadsheet for data collection for the experiment we design in class
  • Exam 2 is open…

Resources

Practice

Back to top of page

Getting More From R

Week 8

Topics

  • Writing Functions | Conditional Execution | source()

Assignments

  • Watch this video from Jenny Bryan about debugging
  • Read this chapter and do all the exercises in it as you read

Resources

Practice

  • Write a function that returns the min, max, and mean of any set of real numbers
  • Write a function that takes a data frame and returns a new data frame with one random column removed
  • Fix my out-of-order code for a summarizing function
  • Write a function that takes a data frame… if there are more than 3 columns, your function should return the column names as-is; if there are 3 or fewer columns, your function should return the column names in reverse order.
  • Write a useful function that you might want to use in the future (your choice)
  • Put all of these functions into a new R script and save it in your main data course repository
  • In a new empty R script, call your functions with source() and test them out
  • There’s a stupid function I wrote in “/Code_Examples/thlayli.R”
    • It takes a data.frame as an input and does WHAT to it?
  • What’s the difference between ‘|’ and ‘||’ or ‘&’ and ‘&&’ ? Why should you always use the double-form in a conditional expression?

Back to top of page

Model Building and Testing

Week 9

Topics

  • Building and Testing Models

Assignments

Resources

Practice

Back to top of page


Week 10

Topics

  • More models | Statistical Tests

Assignments

  • Show up to class. Models are confusing at first and there’s a lot to learn.
  • Ask questions during class.

Resources

Practice

  • Go through the R script “/Exercises/more_models.R”
    • Follow along with my analyses of the first two data sets
    • Complete an analysis of the third data set

Back to top of page

Communicating Your Results

Week 11

Topics

  • R-Markdown | Reproducible Reports

Assignments

Resources

Practice

  • Using the resources above, generate a markdown document that analyzes the “iris” data set and push it to a new GitHub repository named Iris_Markdown
  • Play with options and code to create a document that looks good and presents your analysis and results clearly
  • This is similar to Assignment_9, but I’m asking for a brand new “Iris_Markdown” repository that is a self-contained report of Iris analyses

Back to top of page


Week 12

Topics

  • Proper Project Organization | Collaboration

Assignments

  • Peer evaluation of Assignment 9 HTML reports (Organization, Portability, Accuracy, Understandablity)

Resources

Practice

  • Peer evaluations of Iris_Markdown repositories (from last week); Clean them up and make them more organized

Back to top of page

Putting it all together

Week 13

Topics

  • Data Analysis from raw to report

Assignments

  • We will work together in class to do a complete analysis in real-time

Resources

Practice

  • Analyze the “esoph” data set and generate a markdown report

Back to top of page


Week 14

Topics

  • Building a website with GitHub and R-Markdown

Assignments

  • Work on Final Project
  • Create a GitHub Personal Website
  • Upload a brief CV and the updated (improved) html of Assignment 9 to your new website

Resources

Practice

  • Go through my course website repository (link above) and try to relate the code there to the html version of the website your internet browser displays
  • Work on your personal website:
    • Add multiple pages with internal links
    • Be sure to have a “Projects” page that links to HTML reports you’ve made, including your final project
    • Be careful not to push any files larger than 50Mb to GitHub or it will break your repository!

Back to top of page


Week 15

Topics

  • Intro to genetic data in R

Assignments

  • Work on Final Project
  • Assignment 10 (working with DNA data in R)

Back to top of page


Week 16

Topics

  • TBD

Assignments

  • Exam 4 (Redo any previous exam to replace it’s score)

Back to top of page



‘Luck is statistics taken personally.’ – Penn Jillette